Unsupervised Learning of Dependency Structure for Language Modeling
نویسندگان
چکیده
This paper presents a dependency language model (DLM) that captures linguistic constraints via a dependency structure, i.e., a set of probabilistic dependencies that express the relations between headwords of each phrase in a sentence by an acyclic, planar, undirected graph. Our contributions are three-fold. First, we incorporate the dependency structure into an n-gram language model to capture long distance word dependency. Second, we present an unsupervised learning method that discovers the dependency structure of a sentence using a bootstrapping procedure. Finally, we evaluate the proposed models on a realistic application (Japanese Kana-Kanji conversion). Experiments show that the best DLM achieves an 11.3% error rate reduction over the word trigram model.
منابع مشابه
An Unsupervised Parameter Estimation Algorithm for a Generative Dependency N-gram Language Model
We design a language model based on a generative dependency structure for sentences. The parameter of the model is the probability of a dependency N-gram, which is composed of lexical words with four kinds of extra tags used to model the dependency relation and valence. We further propose an unsupervised expectationmaximization algorithm for parameter estimation, in which all possible dependenc...
متن کاملTwo Approaches for Building an Unsupervised Dependency Parser and Their Other Applications
Much work has been done on building a parser for natural languages, but most of this work has concentrated on supervised parsing. Unsupervised parsing is a less explored area, and unsupervised dependency parser has hardly been tried. In this paper we present two approaches for building an unsupervised dependency parser. One approach is based on learning dependency relations and the other on lea...
متن کاملCovariance in Unsupervised Learning of Probabilistic Grammars
Probabilistic grammars offer great flexibility in modeling discrete sequential data like natural language text. Their symbolic component is amenable to inspection by humans, while their probabilistic component helps resolve ambiguity. They also permit the use of well-understood, generalpurpose learning algorithms. There has been an increased interest in using probabilistic grammars in the Bayes...
متن کاملNeutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation
Dependency parsing is a central task in the field of Natural Language Processing (NLP). The task involves the automatic labeling of natural language sentences with dependency structures, such that each word is labeled as the dependent of another word in the sentence (its syntactic head). This formalism is important, both in the linguistic aspect (Mel’čuk, 1988) and in empirical aspects, as it u...
متن کاملPhrase Dependency Machine Translation with Quasi-Synchronous Tree-to-Tree Features
Recent research has shown clear improvement in translation quality by exploiting linguistic syntax for either the source or target language. However, when using syntax for both languages (“tree-to-tree” translation), there is evidence that syntactic divergence can hamper the extraction of useful rules (Ding and Palmer 2005). Smith and Eisner (2006) introduced quasi-synchronous grammar, a formal...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003